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'^~^, Suppose that U = (C/i, . . . , Ud) has a Uniform([0, 1]°') distribu- 

C^ ■ tion, that Y = (Yi, . . . , Yd) has the distribution G on R^, and let 

X = {Xi,...,Xd) = (UiYi,..., UdYd). The resulting class of dis- 
tributions of X (as G varies over all distributions on R!j_) is called 
00 ' the ScaJe Mixture of Uniforms class of distributions, and the corre- 

sponding class of densities on IR+ is denoted by J-sMv{d). We study 
maximum likelihood estimation in the family J^sMu(rf)- We prove ex- 
istence of the MLE, establish Fenchel characterizations, and prove 
strong consistency of the almost surely unique maximum likelihood 



C^ 



H 

r^ . estimator (MLE) in J-sMv{d)- We also provide an asymptotic min- 

imax lower bound for estimating the functional / i— >■ f{x) under 
reasonable differentiability assumptions on / € J-sMv{d) in a neigh- 
borhood of X. We conclude the paper with discussion, conjectures and 
open problems pertaining to global and local rates of convergence of 
the MLE. 

> 

(N 

lO . 1. Introduction and summary. Fix a non-negative integer k, and 

suppose that Xi , . . . , X„, are i.i.d. random variables distributed according 

to a density in the convex family of k-monotone densities (with respect to 

Q ■ Lebesgue measure) on (0, oo): 

o' 



m 



X 



(1.1) J'k := I fk,G{-) ^ j^ k ^^ ^r dG(y) 



GeGi 



H I where Qi will denote the set of all distribution functions on (0, oo) grounded 

at 0. Here, we use the notation x^ = x ■ lu>oi for any x G R. It has 
been shown by Williamson [1956] that the family J^^ is identifiably indexed 
by Qi. In other words, if Gi,G2 are distinct elements in Qi, then fk,Gii') 
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2 MARIOS G. PAVLIDES AND JON A. WELLNER 

and fk,G2i') differ on a Lebesgue non-null set. Note that J^k is exactly the 
collection of all scale mixtures of Beta(l, k) densities. 

The Beta{l, 1) distribution is the standard uniform distribution, C/(0, 1). 
Therefore, the class J-'i coincides with the class of all scale mixtures of uni- 
form densities on (0, oo). A well-known theorem by Khintchine (see, e.g.. 
Feller [1971, p. 158]) asserts that the class of densities on (0,oo) with con- 
cave distribution functions is one and the same with our class J^i. It can be 
seen that J^i is also the class of all upper semi-continuous, non-increasing 
densities on (0,oo). This class is induced by order restrictions, a term we 
use to explicitly mean that there exists a partial ordering (^) on the com- 
mon support X of the densities in J^i such that / G J-"i if and only if / is 
isotone with respect to this ordering: i.e., / E J^i if and only if f{x) < f{y) 
whenever x,y £ X such that x <C y. In this case, (<C) is the natural partial 
ordering, (>), on (0,cx3). 

Non-increasing, upper semi-continuous densities (in short, monotone den- 
sities) arise naturally via connections with renewal theory and uniform mix- 
ing (see, e.g., Woodroofe and Sun [1993].) Maximum likelihood estimation 
of monotone densities on (0, oo) was initiated by Grenander [1956a, b], with 
related work by Ayer et al. [1955], Brunk [1958] and van Eeden [1956a, b,c, 
1957a, b]. Asymptotic theory of the MLE in J^i (the Grenander estimator) 
was developed by Prakasa Rao [1969] with later contributions by Groeneboom 
[1985, 1989], Birge [1987, 1989] and Kim and Pollard [1990]. See Balabdaoui et al. 
[2010] for descriptions of the behavior of the Grenander estimator at zero. 

Nonparametric estimation in families of densities described by order re- 
strictions goes back at least to the work of Grenander [1956a, b], Brunk [1958, 
1970] and Robertson [1967], with further development by Wegman [1969, 
1970a,b] and Sager [1979, 1982]. Also see the books by Barlow et al. [1972] 
and by Robertson et al. [1988]. Polonik [1995a,b, 1997, 1998] addressed esti- 
mation in various order restricted classes of multivariate densities from the 
perspective of the excess mass approach studied previously by e.g., Sager 
[1979, 1982] and Miiller and Sawitzki [1991]. Polonik shows that (under rea- 
sonable assumptions) the MLE in such classes exists and coincides with an 
estimator he constructs and calls the silhouette. Forcing the elements of the 
class to be upper semi-continuous, the MLE is seen to be unique. Brunk 
[1958] also gives a graphical construction of the maximum likelihood esti- 
mator, and establishes Li-consistency of the MLE. 

In this paper our goal is to extend the notion of "monotone densities" to 
higher dimensions; i.e., to densities on (0, oo)'^ with d > 1. Such an extension 
is not unique: For example, we may consider the family, J-BBTiid), of "block- 
decreasing densities" (a term coined by Biau and Devroye [2003]) that con- 
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MULTIVARIATE MONOTONE DENSITIES 3 

tains all upper-semicontinuous densities on (0, oo) that are non-increasing 
in each coordinate, while keeping all other coordinates fixed. This class was 
perhaps first introduced by Robertson [1967]. The particular proper subclass 
of TBDuid) studied here is the family TsM\j{d) of all multivariate scale mix- 
tures of uniform densities; i.e. the family of upper semi-continuous densities 
on (0, cxo)'^ of the form 

(1.2) fG{x)= f f^l(o,y](a;)) dG(y) , a;G(0,oo)'^ 

J(o,oo)d \\y\ / 

for some G G Qd, the set of all distribution functions on (0,oo)'^ that 
grounded (zero) at ; here we use the notation \y\ = ni=i Vi ^^^ V — 
{yi, . . . ,yd)' G (0, cxo)'^. For any fixed G G Qd, it is clear that ii Y = 
(Yi, . . . , Yd)' is distributed according to G on (0, co)"^ and ii Ui, . . . ,Ud are 
i.i.d. ^'^(0, 1) (and independent of Y), then the vector X := {UiYi, . . . , UdYd) 
is distributed according to /g(') on (0,00)*^. 

Whereas the family -Fbdd('^) is characterized by order restrictions (and 
thus the results by Polonik apply), its subclass J^SMU is not; as will be 
made more explicit in section 2, densities in the class -FsMU also satisfy 
non-negativity restrictions on their d— dimensional differences around all 
rectangles. Because of this additional shape restriction, estimation in this 
family requires separate treatment. 

A univariate parallelism to the latter point would be to consider the family 
J^2 in (1-1)) induced by mixtures of triangular densities; this class can easily 
be seen to be exactly the class of all non-increasing, convex (and hence con- 
tinuous) densities on (0, 00). Thus J-2 C Ti is not an order-constrained class 
of densities, in contrast to its superclass J-'i. Convex densities arise in con- 
nection with Poisson process models for bird migration and scale mixtures of 
triangular densities (see, e.g., Hampel [1987], Anevski [2003] and Lavee et al. 
[1991]). Estimation of non-increasing, convex densities on (0,oo) was appar- 
ently initiated by Anevski [1994] and was further pursued by Wang [1994], 
Jongbloed [1995] and Anevski [2003]. The asymptotic distribution theory 
and further characterizations of the nonparametric MLE of such a density 
and its first derivative at a fixed point (both under reasonable assumptions) 
was obtained by Groeneboom et al. [2001a, b]. These authors show that the 
local rate of convergence of the MLE of the functional / 1— )■ f{x) is of the 
order n?'^, whereas the Grenander estimator (the MLE in J-'i) converges 
locally at the rate of only n^'^. 

Here is an outline of the remainder of the present paper: In Section 2 
we provide characterizations of the family J-'sMvi^) that will prove useful in 
the sequel. Section 3 addresses existence, strong, pointwise consistency as 
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4 MARIOS G. PAVLIDES AND JON A. WELLNER 

well as Li and Hellinger consistency of a sequence of maximum likelihood 
estimators in J-'sMvid)- In Section 4 we derive a local asymptotic minimax 
lower bound for estimation of f{x) at a fixed point x under for which / 
satisfies d'^f{x)/{dxi ■ ■ ■ dxd) / 0. The lower bound entails a rate of con- 
vergence of •n}'^ for all dimensions d and yields a constant depending on 
/ which reduces to the known lower bound constant for d = 1. The paper 
concludes in Section 5 with a discussion of conjectures and open problems 
related with both the local (pointwise) and the global (Li and Hellinger) 
rates of convergence of the MLE in -FsMU {d) ■ 

2. Properties of the Scale Mixtures of Uniform family of densi- 
ties. 

2.1. Properties of FsMu{d)- A density function, /, on (0,oo)'^ will be 
called a (multivariate) Scale Mixture of Uniform densities if there exists a 
distribution function, G, on (0, oo)'^ such that 

(2.1) f{x) = fG{x) = f ^l(o,,](^)dG(^) 

(2.2) =11- dG{v) for ah x G (0, oo)'^ . 

Jv>x I'^l 

It is clear from (2.2) that a SMU density is also a block-decreasing density: 
/g(-) is non-increasing in each coordinate, while keeping all other coordinates 
fixed. Also, the map G i— )• /g is identifiable in the following sense: if Gi 7^ G2, 
then fcx 7^ /g2 oil 3- S6t of positive Lebesgue measure; also see Theorem 2.3 
below. The following lemma gives a formal statement and proof of a slightly 
more general result. 

Lemma 2.1. Two upper semi- continuous and block- decreasing functions 
f and g on R differ nowhere in the interior of their support or else on a 
Lebesgue non-negligible set. 

Proof. Assume that x is in the interior of the support of both / and g 
and that f{x) / g{x). Without loss of generality, assume that f{x) > g{x). 
Since g is upper semi-continuous and x is an element of the || • ||2-open set 
{y I 9{y) < f{y)}^ 'we have that there exists an e > such that the || • II2- 
ball of radius e around x, i?||.||,(a;, e), be a subset of {y \ g{y) < f{y)}- In 
fact, we have that / and g differ on the Lebesgue non-null set A = {y < 
X \ \\x — y\\2 < e} since y £ A implies that g{y) < f{x) < f{y) and 
subsequently that g{y) < f{y) - where here we have also used the fact that 
/ is block-decreasing. The proof is complete. ■ 
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MULTIVARIATE MONOTONE DENSITIES 5 

The distribution function Fq corresponding to X ~ /c is given by 

(2.3) Fg{x)= [ t^dG{v), 

where < denotes the natural partial ordering on M , while 

xAv = {xi,...,Xd) A {vi,...,Vd) = {mm{xi,vi}, . . . ,mm{xd,Vd}), 
and 

xy V = {xi,...,Xd) V {vi,...,Vd) = {max{xi,vi}, . . . ,max{xd,Vd}) ■ 

The distribution function Fq of X ~ /c is generally not concave when 
d > 1, unlike the case when d = 1. A SMU density (and a block-decreasing 
density, in general) can possibly diverge at the origin, whereas the pointwise 
bound f{x) < l/\x\ holds since, for x G (0,oo) we have 

1= / f{y)dy> [ f{y)dy>\x\f{x). 

J{0,oo)'* -'(0,a;] 

Further, a d— dimensional analogue of the proof of Devroye [1986, Theo- 
rem 6.2, p. 173] can be used to show that 

(2.4) hm {\x\f{x)} = lim{|a;|/(^)} = , 

a;|— s-oo xlO 

whenever / is a block-decreasing density on (0, oo)'^. 

For any two points x,y ^ [0, oo) , such that x < y, we write [x,y] = 
[xi,yi] X ••• X [xd,yd], [x,y) = [xi,yi) x ••• x [xd,yd), {x,y] = {xi,yi] x 
• • • X {xd,yd], {x,y) = (xi,yi) x • • • x {xd,yd) for the natural closed, lower- 
closed upper open, lower open upper closed, and open rectangles respectively. 
Note that the closed rectangle [x, y] has (at most) 2 vertices, the points 
u = {ui, . . . , Ud) where each Ui is either Xi or y^. Following Billingsley [1995], 
we write sgnr^. yi(u) G { — 1, 1}, the signum of the vertex w, according as the 
number of i, 1 < i < d, satisfying Ui = Xi is odd or even respectively. 

Thus any two vertices defining an edge of the rectangle have alternating 
signs. Then, if w = (tti, . . . , Ud) is some vertex of [x, y] and 6 £ {—1, +1} is 
its signum, then {6, u) is an element of the set 



Ad[x,y] = Ui-l)^ti{Mu,^.A , 



u 



u G {xi,yi} X ••• X {xd,yd} 
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6 MARIOS G. PAVLIDES AND JON A. WELLNER 

Definition 2.1. For an upper semicontinuous and coordinatewise de- 
creasing function g : (0, oo)'^ — t- [0, oo) define the ^-volume of a (possibly 
degenerate) rectangle [x, y) by: 

(2.5) yg[^,y)= Y. {^9{u)}, 

{5,u)eAa[x,y] 

provided that g is defined and is finite for all u in the suniniand. Correspond- 
ingly, for an upper semicontinuous and coordinatewise increasing function 
g : (0, oo) — >• [0, oo), we define the gr-volume of a rectangle {x,y] by the 
sum on the right side of (2.5). 

It is easily seen that for a SMU density, fc, the /c- volume of any rectangle 
[x,y) is always of the sign (—1)*^: Indeed, consider (2.2) and observe that 

(2.6) {-iyVf^[x,y)= [ l-dG{v)>0. 

From (2.6), or, alternatively, from the fact that the class of sets [x,y) is a 
TT— system which generates the Borel cr— field of subsets of [0, oo)'' and then 
extending as in Billingsley [1995], it is clear that {—l)Vf extends uniquely 
to a (non-negative) measure on the Borel cr— field B'l = B H [0, oo) given 
by 

(-iyVf(A) = I ^dGiv) for A e Bi; 

J A \t^\ 
in particular, 

{-lYVf{xM= I ^,dG{v). 

The following lemma extends this argument to an arbitrary upper semicon- 
tinuous function g with the (—l)'^5f— volumes of all rectangles [x,y) non- 
negative. 

Lemma 2.2. Suppose that g is a non-negative, upper semi- continuous 
function satisfying {—l)Vg[x,y) > for all lower-closed upper open rectan- 
gles [x,y), and vanishing if any coordinate tends to oo. Then {—l)'^Vg can 
be extended to a countably additive measure on B'^. 

Proof. Since the class of all rectangles of the form [x, y) is a vr— system 
which generates B'^, , this follows immediately from the analogue of Billingsley 
[1995] with obvious modifications (replace Billingsley's sets A with our sets 
[x,y) and F with F{x) = Vg[x,oo) continuous from below). ■ 
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MULTIVARIATE MONOTONE DENSITIES 7 

Of course it is easy to exhibit a block-decreasing density that is not a 
SMU density: consider the uniform density on the closed triangle in K,^ 
with vertices (0,0), (0,1) and (1,0). Then, 

(-l)V/[(l/8, 1/8), (1/2,3/4)) = -2 <0, 

showing that this density is not a SMU density, even though it is block- 
decreasing. 

The following theorem establishes identifiability of the mixing distribution 
G as well as providing a useful characterization of SMU densities. 
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8 MARIOS G. PAVLIDES AND JON A. WELLNER 

Theorem 2.3. 

(a) For the class of SMU densities TsMu{d) = {fa : G £ Qd} with fc as 
given in (2.1), f £ J^SMu{d) if and only if f = fc where G £ Qd is 
given by 

(2.7) G{x) = [ {-l)''Vf{u, x] ■ 1[^<,] du . 

Thus there is a one-to-one correspondence between G £ Gd and fc £ 
J^SMu{d). 

(b) Suppose that the Lebesgue density f on (0, oo) is such that it con- 
verges to zero in each coordinate, while keeping all other coordinates 
fixed. Then, f is a SMU density if and only if {—l)'^Vf[x, y) > for all 
<x <y. 

Proof, (a) Suppose that f = fc, ior G £ Gd (recall that this implies 
that G(0) = 0), is a SMU density evaluated at an arbitrary x £ (0,oo)'' as: 

(2.8) fix) = f ^1(0,,] dG{y) = f ■■■ f ^ dG{y) , 

J(0,ooy \y\ Jyi>xi Jyd>xd 1^1 

SO that df{x) = (-l)'^|a;|-^ dG{x) and thus, 

G{x) = [ l(o,,](y)|y|d{(-l)'^/(y)} 

l^o,y]iu)dud{{-lff{y)} 

= I \ I d{(-l)'^/(y)}| du 

J(0,x] lJy£{u,x] J 

= / {-lfVf{u,x]du, 

J{0,x] 

where the second to last equality follows by Fubini-Tonelli. 

We will now show that G is unique: Suppose that (2.8) above holds for 
G = Gi £ Gd and i = 1, 2. Recall that this implies that Gi(0) = ^2(0) = 
and, thus, G'o(') = G'i(-) — G2(-) is such that Go(0) = 0, /,g .a Go{x) da; = 
and 

(2.9) 0=/ ^1 jdGo(i/)=/ T^dGoiy) 

J[o,oo)'i \y\ J{o,x] \y\ 
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MULTIVARIATE MONOTONE DENSITIES 9 

holds for all x G (0, oo) and, thus, necessarily Gq{x) has to be independent 
of X and therefore everywhere equal to its value at 0: Go(0) = 0. This 
completes the assertion of uniqueness, since Gi = G2- 
(b) If / is in FsMUi there exists G £ Gd such that 

fix) = f T^lio,y](^) dG{y) = [ j^ dGiy) , 

j{o,oo)'* \y\ Jy>x \y\ 

so that it is easily seen that {—lYVf[x,y) = /, ^ \y\~'^ dG{y) > holds 
true for all < a; < y. 

On the other hand, assume that the Lebesgue density / is such that it 
converges to zero in each coordinate, while keeping all other coordinates 
fixed, and satisfies {—l)Vf[x,y] > for all < a; < y. First, observe that, 
by Lemma 2.2, this implies that for a^i < iC2 :^ x, elements of (0,00)"^, we 
have 

{-iyVf[xi,x) > {-lfVf[X2,x) 

and, letting x — )• 00, this yields f{xi) > f{x2) because we assumed that / 
vanishes as any one of its coordinates diverges to infinity, so that Vf[xi, x) — )• 
{—l)'^f{xi) for i G {1,2}. Thus, / is block-decreasing. 

Hence, by appealing to part (i), it thus suffices to show that G, as defined 
on (0,00)'^ by (2.7) is a valid distribution function. 
(i) G is grounded at trivially by inspection: G(0) = 0. 

(ii) Notice that 

lim G{xi, . . . ,X(i) = lim {G(nl)} 



lim / {-lfVf{u,nl]l[^<n^]du 

(-1)' / lim {Vf{u,nl]} lim {l[,<„,i]} dtx 

(-1)' / (-!)'/(«+) du= f f{u) du = l, 



'{0,00)'* J(0,oo)a 

where in the steps above we have used the fact that for each fixed u S 
(0, oo)'^, the sequence Xn{u) := Vf{u, nl] l[u<ni] is increasing in n € M and 
we applied the monotone convergence theorem, and noted that lim„_j.oo{l[tt<ni]} 
1 for any fixed u £ (0,oo) , and that 

lim {Vfiu,nl]} = lim V 5fiv) = i-lffiu+) 
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10 MARIOS G. PAVLIDES AND JON A. WELLNER 

because 

0< lim f{x)< lim {l/|a;|} = 0, 

|a;|— )-oo |a;|— )-oo 

since / is block-decreasing. Finally, the proof is complete as soon as we 
observe that (— 1)^*^ = 1 and that f,^ ^^ f{u) du = 1, since / is a density, 
(iii) Now, fix < a; < y and note that (since G is an increasing upper- 
semicontinuous function) 

VGix,y] = Yl i'^^(^)} 

{S,v)eAa[x,y] 



{-iff Yl {6Vf{u,v]l[^<^]}du 

/ {-lfVf{uVx,y]du>0, 

J(0.u] 



'{o,y] 
by geometric inspection and Lemma 2.2. ■ 

2.2. Lebesgue measurability of block- decreasing functions. Now we es- 
tablish a technical fact concerning the (Lebesgue) measurability of block- 
decreasing functions which will be needed in our proofs in Section 3.2. We 
begin with a definition and then a lemma. 

Definition 2.2. We call a subset C of W^ a "defective rectangle" if 
and only if there exist real numbers Oj < bi for i = 1,2, ... ,d, such that 

{ai,bi) X ■■■ X {od, bd)^C (^ [ai, bi] x ■ ■ ■ x [a^, bd] ■ 

Thus, by definition, a defective rectangle is a compact rectangle in M!^ minus 
a potentially non-void subset of its boundary. In our definition, a defective 
rectangle is taken to be both bounded and non-degenerate. 

Lemma 2.4. Any union of defective rectangles in R"^ is a Lebesgue set. 

Proof. Let C = {Cj | j G J} be a family of defective rectangles in M,'^, 
indexed by some set J. For each j £ J let the real numbers aij < bij, for 
i £ {1,2, ... , d}, be uniquely determined by 

{aij,bij) X ■■■ X {adj,bdj) C Cj C [aij,bij] x ■■■ x [adj,bdj] . 

For any x G IR,'^ and e > let B{x,e) denote the open || • ||2-ball centered at 
X and with radius less than e. Let also A* denote outer-Lebesgue measure 
on H"^ and A its restriction on the Lebesgue sets. 
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MULTIVARIATE MONOTONE DENSITIES 11 

Let A = Uj^jCj denote the union of the elements in C and notice that 
the interior subset of A is the set 

int(A) = |J(aij,6ij) x ••• x {adjMj) , 

exactly because int(Cj) = (ai,j,6i,j) x ••• x {adj,bdj) for each j G J and 
because an arbitrary union of open sets is open. Since int(A) is an open set, 
to show that A is a Lebesgue set, it suffices to show that A*(A\ int(A)) = 0, 
from which one concludes that T = A\int(A) is a Lebesgue-null set and 
hence A a Lebesgue set also. 

Notice that if L = there is nothing to show. Now, given L 7^ 0, fix 
an arbitrary element y £ T and observe that there exists an index k £ J 
such that y lies on the boundary of C^; i.e., y S dcl{Ck) where A(cl(Cfc)) = 
nf=i(&i,fe - CLi^k) > 0. Letting 

Vcfc = {ai,k, bi,k} X ■ ■ ■ X {ad,k, bd,k} 

denote the 2'^ vertices of cl(Cfc) we have that 

A(int(Cfc)ni?(y,e)) ^ /l^'^ 



X{B{y,e)) - V2 

holds true for all < e < min{||y — z\\2 \ z G ^fc\{2/}}- This observation, 
in conjunction with the fact that int(Cfc) C F"^, immediately yield 



<iO I A(B(s,,e)) J - V2 

The last inequality, and the fact that y £T was arbitrary, show (by appeal- 
ing to the Lebesgue density theorem, see e.g. Colin [1980, Corollary 6.2.6, pg. 
184]) that r contains no density points and is consequently a Lebesgue-null 
set. ■ 

With this lemma at hand we are ready to prove Lebesgue measurability 
of non-negative, block-decreasing functions that vanish at infinity. 

Proposition 2.5. Let f be a real-valued, non-negative function on (0, oo)' 
that is non-increasing and convergent to zero in each coordinate Xj, keeping 
all other coordinates fixed, as Xj coordinate tends to oo. Then: 

(a) / is Lebesgue-measurable. 

(b) There exists such a function f that is not Borel- measurable. Such an f 
exists with f also satisfying sup{/(a;) | x G (0, oo)'^} < oo. 
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12 MARIOS G. PAVLIDES AND JON A. WELLNER 

Proof. Proposition 2.5 follows from Theorem 3 of Lang [1986], but for 
completeness we give another proof here, (a) Note that [/ > 0] = [a; G 
(0, oo)'^ I f{x) > 0], the support of /, is the closure of [x G (0, 00)°' | f{x) > 
0], and thus a Borel set; hence it is also a Lebesgue set. 

Fix t > 0; since / is non-negative, block-decreasing and vanishes at infin- 
ity, [f >t] = [x e (0, oo)'^ I f{x) > t] has the form 



[f>t]= [j C^ 



x&At 

for some (non-unique) subset At of (0, 00)'^, where 

C^e{{Q,xl{0,x]\{x]] 

is a defective rectangle (by Definition 2.2), for each x £ Af. Hence it follows 
by Lemma 2.4 that [/ > t] is a Lebesgue set. Since the argument above holds 
for all t > 0, the proof of Lebesgue-measurability of / is complete since the 
class of sets {[t, 00) | t £ R} generates the Borel cr-field. 

(b) We shall provide a counter-example in two dimensions, d = 2. For 
higher dimensions, analogous counter-examples can be constructed. As soon 
as we convince ourselves that a non-Borel subset. A, of A = {{x,l — x) G 
(0,1)^ I < X < 1} exists, we construct / on (0, 00)^, satisfying sup{/(a;) | 
X G (0,00)^} < 00, by /(•) = l^(-) where 

A= \J (0,x] x(0,y]. 

{x,y)£A 

Notice then that [/ > 1] = A is not a Borel set as A is taken to be a non- 
Borel subset of A and it is an easy task to verify that A n ^ = ^. Indeed, 
on one hand A '^ A D A follows directly from A Q A and A C A. On the 
other hand, if {x,y) G A n ^ we have that there exists an (xq, yo) G A such 
that 

< x,xo,yo,y < 1 , 
x + y = xo + yo = l, 
X < xq and y < yo ■ 

Combining the above relationships we conclude that necessarily (x, y) = 
{xq, yo) £ a and the proof of A n A = ^ is complete. 

To conclude this counter-example we elaborate briefly on the existence of 
a non-Borel subset A of A. In doing so, we follow steps as in Shorack [2000]. 
Let D be a subset of (0, 1) that is not a Lebesgue set - the existence of 
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MULTIVARIATE MONOTONE DENSITIES 13 

which is guaranteed by Proposition 1.2.2 in Shorack [2000]. As in Example 
7.1.1 of Shorack [2000], let F be the Lebesgue singular distribution function 
that gives mass 1 and is 1-1 on the Cantor set, C. Let B = F~^{D) so 
that S be a subset of the Cantor set, C, and a Lebesgue-null set as -B C C 
and A(C) = 0. Let also A = {(x,l — x) \ x £ B}. We argue that A so 
constructed is not a Borel subset of K.^. Assume the contrary, i.e. assume 
that A is in fact a Borel set. Since the vector-valued function x i— ?• {x,l — x) 
is a one-to-one, (Borel) ^-measurable mapping on (0, 1) we have immediately 
that B must also be a Borel set in R. But then, since F is non-decreasing, 
we have that F{B) is also a Borel set. In addition, since F is one-to-one on 
C, we have that D = F{B) and thus that D is a Borel and hence a Lebesgue 
set. This is a contradiction, because D was taken to be a non-Lebesgue set, 
by definition. This contradiction yields that A, so constructed, is indeed a 
non-Borel subset of R^. ■ 

3. Existence and Consistency of the MLE. Let Xi,...,X„ be 
i.i.d. random vectors distributed according to some density /o = /gq ^ 
-^SMu(c^) where /o is unknown. Our goal is to estimate the unknown SMU 
density, /q, based on X i, . . . , X„. We will be interested in maximizing the 
likelihood function / i— ;• HiLi fi-^i) ^^^ equivalently, the log-likelihood func- 
tion / ^ nFn log{/(X)} over / G TsM\j{d) where P„ = n'^ Yh=i 5x, is the 
empirical measure of the data. Any such maximizer, /„ G ^SM\j{d), should 
one exist, will be called a (nonparametric) maximum likelihood estim,ator of 
/o, based on Xi, . . . ,X„. Since /o = /go is given by (2.1) it follows from 
Theorem 2.3 that estimation of /o G J^SMU is equivalent to estimation of 

Gq. 

3.1. On existence and uniqueness of an MLE. We begin with a definition 
followed by the main theorem of this subsection. 

Definition 3.1. [Rectangular grid generated by data] Suppose 
that xi, . . . ,Xn are (fixed or random) elements in (0, oo)'^ and suppose that 
Xi = {xii, . . . jXict)' where i = 1,2, ... ,n. Define the matrix A = [xij] G 
-^nx(i((0,oo)) whose i^^ row is exactly x'-, for i G {1,2, . . . ,n}. Also let 
A^ = { (2;(i^)^i, X(J2) 2, . . . , a;(i^),rf) | ii, . . . , id G {1, 2, . . . , ?i}} denote the rect- 
angular grid generated by A, where a^(i),j denotes the i smallest element 
among xij, . . . , Xnj where i £ {1,2, . . . ,n} and j £ {1,2, . . . ,d}. In particu- 
lar, x^ = ix(^i)^i,X(^i)^2,---,X{i)4) and X* = (xt^ri), i,X(^n), 2, ■■■ ,X(^ri),d) denote 
the element-wise minimum and maximum of Xi, . . . ,Xn, respectively. For 
each fixed j G {1, 2, . . . , d}, let 
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14 MARIOS G. PAVLIDES AND JON A. WELLNER 

nj{A) := card({xij | i = 1,2, . . . ,n}), and notice that we have: card(A'') = 

Theorem 3.1. [Existence and characterization of an MLE in 

J^S'Mvid)] 

(a) A maximum likelihood estim,ator (MLE), fn = fp £ J~SMuid) of /o = 

fco £ J^SMu{d) almost surely exists, where Gn £ Gd is a purely-atomic 
probability measure, with at most n atoms, all of which are concentrated 
on M - the rectangular grid generated by the data Xi, . . . , X„. 

(b) For almost all ui, the unique MLE, fn = fg € ^SMu{d), is completely 
characterized by the following Fenchel conditions: 

(3.1) P„ J J^^ I < |a;| ; for all x (£ (Q, oof , 

"\/„(x)J -' 

(3.2) and Pn \ — — - — }• = |y| ; if o,nd only if 

I /n (X) I 



y G (0,oo) satisfies Gn{{y}) > 0; or, equivalently, 
(-l)'^Hm{y^[y,y + el)}>0. 



€4-0 

Maximum likelihood estimation in mixture models has been studied in 
general by Lindsay [1983], and this material is nicely summarized in Lindsay 
[1995, Chapter 5]. To prove the present theorem, we will therefore appeal 
to the results in Lindsay [1995, Chapter 5] and Rockafellar [1970]. We begin 
with three lemmas. 



Lemma 3.2. The support set of the mixing measure Gn of any MLE 
fn is contained in the grid A"^ C (0, 00)*^ generated by the observed data 
Xi,... ,Xn; i.e. supp{Gn) C A*. 



Proof. First we show that y C (0,X*] where X* = Xi V • • • V X„ 
and the maximums are taken coordinatewise. If fn maximizes Ln{f) = 
nWn\ogf{X) over J G TsMvid) aiid there is some y £ (0,00)^^ \ (0,X*] 
with y £ y, then /n(y) > 0. Since /„ is block decreasing, this implies that 
< f(o,x*] fn{x)dx = P <1. Then consider f{x) = (/„(a;)//3)l(o,x*](aj); it 

is easily seen that / G J^SMuid) and has greater likelihood than /„, contra- 
dicting the assumption that fn maximizes the likelihood. Thus 3^ C (0, X*], 
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MULTIVARIATE MONOTONE DENSITIES 15 

and we may restrict attention to the class of estimators with support con- 
tained in (0, X*], say K.*{d). Suppose that /„ G K,*{d). Consider the mixing 
measure G„ defined by 

Gn= XI '^J^wJ ^ '^j = C Yl '^J^Wj 

j:Wj€A* ' j-.WjGA* j-.WjGA* 

where 

^^- = (-i)'^/;[^i'^/) • l^il' for w^ e A* 

where Wf G A* defines the smallest rectangle above and right of Wj in 
the partition of [0, X*] defined by the data. Then it is easy to see that 



'(0,oo) 

satisfies 



fix)= 7—\^(o,u]{x)dGn{u) 

J(0,oo)<' I'^'l 



f{W,) = c Y. 



\W A 



k: Wk>Wj 

C{-lfVj^jWj,2X*) = CfniX,), 



and this implies that 



j-.Wj&A* 



where W^ defines the smallest rectangle below and to the left of Wj in 
the partition of [0, JC*] defined by the data. If /„ ^ f, then there exists 
y G (^75 ^j] foi' some Wj G vl* such that fn{y) / fiv), and then 
necessarily /„(y) > f{y) = f{Wj). This yields, since /„ G K:*{d), 



f f{x)dx = C Y \fniW,) f 



dx 



< C Y fn{Wj) [ fnix)dx = C [ fn{x)dx = C 

j: w,&A* hw-,wA J{0,X'] 

since / G K*{d). Thus / has a greater log-likelihood than /„, and it follows 
that supp(Gn) C A*. ■ 
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16 MARIOS G. PAVLIDES AND JON A. WELLNER 

Now we can prove uniqueness of the MLEs /„ and G„. 

Lemma 3.3. There exists a set of points y = {yi, . . . ,y^^} C (0, oo)'^ 
with m < n such that a TsMu{d) density fn with corresponding mixing 
measure Gn is the MLE only if supp{Gn) C y. Thus any MLE has the form 

m ^ 

(3-3) fn{x) = X]^J I — I l(o,?/j](^) 

where ttj > 0, X^TLi vtj = 1. Moreover, the vector (/n(-'^j))"=i is unique. 
Proof. As in Lindsay [1983, 1995], define r{u) G (0,oo)" by 

TH := ( -j— r l(o,u](^i),---,rT l(o,ul(^n) I , 

and define the set F = {T{u) | li € (0, oo)'^}. Then L is a closed and bounded, 
hence compact, subset of [0, oo)". Thus by Rockafellar [1970, Theorem 17.2] 
conv(r) = conv(r) = conv(r) is also a compact subset of [0, oo)'^. Thus 
the continuous function Y[7=i ■^i attains its supremum on conv(r). Let 5 = 
argmax2.gconv(r) Sr=i ^°S-^*- Since the intersection of F and the interior 
(0, oo)" of [0, oo)" is not empty, we have S C (0, oo)'^. Since "^^^i^og Zi is 
strictly concave, S consists of a single point, / = (/i)"=i > 0. Therefore for 
any MLE /„ it follows that the vector {fn{Xi))2=i is unique. Note that the 
gradient of X^"=i log-Zj at / is proportional to 1/f = (l//i)r=i- 

Now dim(conv(F)) = n; if we consider the n points Ui = Xj, then 
the n vectors T{ui) = (l(o,Xi](^i), • • • , l{o,x,](^n))/|^j|, i = l,...,n, 
are almost surely linearly independent. (In fact, the matrix M with rows 
\Xi\T{Xi), i = 1, . . . ,n has det(M) = 1 a.s. if the Xj's are i.i.d. with any 
density /.) By Rockafellar [1970, Theorem 27.4] the vector 1/f belongs to 
the normal cone of conv(F) at /. Since 1/f > we have / G 9(conv(F)) 
and the plane r defined by Y17=i ^i/fi = ?^ is a support plane of conv(F) at 
/. Thus for Vi = l/{nfi), i = 1, . . . , n, it follows that 






^) = H - 2^^ii(o,it](^i) > 



=1 



for all u G [0, oo)'^ and q{u) = if it = or F(m) G r. We let 3^ denote the 
set of vectors u such that F(u) G r; i.e. F(3^) = t (IT. 

The intersection Tnconv(F) is an exposed face of conv(F); see e.g. Rockafellar 
[1970, p. 162]. By Rockafellar [1970, Theorem 18.3], rnconv(F) = conv(F(3;)), 
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MULTIVARIATE MONOTONE DENSITIES 17 

and by Theorem 18.1, supp(G„) C y. This imphes that for any MLE /„, 
the support of the corresponding mixing measure G„ is a subset of y, and 
thus any MLE has the form (3.3) with y • G 3^ for j = 1, . . . , m. To see that 
m < n, note that yj £ y C A* satisfy 

n 

(3.4) \yj\ = '^Vil^o^y^]{Xi) = {v,\yj\r{yj)), j = l,...,m. 
Suppose that the vectors {\y j\^ {y j)}JLi are linearly dependent; i.e. 

m 

Y,b,\y,\Tiy^) = 
i=i 

in R" for some bj, j = 1,. . . ,m. Since all the coordinates of the \yAT{yj) 
vectors take values in {0, 1}, this system of equations is algebraically equiv- 
alent to the same system in which all the 6j's take only integer values, i.e. 
bj G Z for j = 1, . . . ,m. 

Then it follows on the one hand that 

m m n 

Y,bj{v,\yj\r{yj)) = ^6j^nil(o,j;^.](Xi) 

j=l j=l i=l 



v,}_^bj\yj\T{yj)) = {v,O) = 0, 
3=1 ' 

and hence, by (3.4), Ylll^i bj\yj\ — 0> o^' since y,- = Wi- G A^ for some 



''31 



Y.^3\W.,\=^ 

with all bj G Z. But this equation has at most countably many solutions 
{|W^i )i = 1, • • • ,?7i}, and hence occurs with Pj^-probability 0. That is, for 
any fixed vector b = {bj)'^^-^ with all bj G Z, the function /^(Xi, . . . , X.„) = 

Ylij=i bj\^ij I has at most a finite number of zeros, so P(]^(/b(Xi, . . . , X,„) = 
0) = 0, and since Z is countable PQ{Uij^jk{fb{Xi, . . . ,Xn) = 0}) = 0. 
Thus PQ{r\ij^jk{fi,{Xi, . . . , Xn) 7^ 0}) = 1. Hence it follows that the linear 
dependence condition only holds on an event with probability 0. 

Thus the vectors |y •|r(y ■), j = 1, . . . ,m are linearly independent almost 
surely Pq, and hence m < n (Pq - almost surely). ■ 
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18 MARIOS G. PAVLIDES AND JON A. WELLNER 

Lemma 3.4. The discrete mixing measure Gn which defines an MLE is 
Pq— almost surely unique. 

Proof. Suppose that there exist two different MLE's f\ and /„. then 

m ^ 

where vr'- > and 'Y^=i ^1 = 1 for / = 1, 2. Therefore 

m ^ 



i=i 



[X] 



where rj = 7r| — vr^ has at least n zeros (since we know that 

{fl{x,))U = {fl{Xi))U = {Ux,))U 

is unique). So, uniqueness holds if the vectors 

(l(o,y^.](Xi))tie{0,ir, for j = l,...,m<n 

are (almost surely) linearly independent. But this follows from the proof of 
Lemma 3.3. ■ 

Theorem 3.1 does not assert that the MLE is always unique. A MLE is 
Pq almost surely unique, but we now present an example in which there 
exist an infinite number of MLE's. 

Example 3.1. [A MLE in -Fsmu is not always unique] To be able 
to graphically illustrate the set L, in the proof of Theorem 3.1, we need to 
restrict consideration to n = 2 and in order that we be able to graphically 
illustrate the MLE(s) we need to restrict consideration to d = 2. Suppose 
that Xi = (1,3) and X2 = (3,2) are the observation points. The set 



''^{0,u]{Xi), l(o,u](^2)) 



U1U2 



u = {ui,u2) e (0,00)' 



and its convex hull, Conv(r), are illustrated in Figure 1. 

Using Lindsay [1995, Theorem 22, pg. 118], it follows that any MLE, 
/2, will have a unique value for / = (/2(Xi), 72(^2)) that is given by 
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Mo.k) 



The union of the bold hnes 
represents the set F. 



A2(i.O) 

(a)r 



A,(0,i) 



A„(0,0) 





The shaded area represents the 




set, Conv(r). 




/ Mhii) 




/ / /=(5.A) 


\^ 


/ .-'' 


r 


^\ 



Mia) 
(b) Conv(r) 



Fig 1. The sets T and Conv(r) based on two observations: Xi = (1,3) and X2 = (3,2). 

/ = {wi ,W2 ) where w = {wi,W2) maximizes the function {wi,W2) 1— )• 
log(u;iu;2) on the set 



{wi,W2) G (0,00)' 



^<2 and ^<2 
3 - 6 - 



It is immediate that w = (6, 12) from which we conclude that / = (1/6, 1/12) 
has exactly two representations as a convex combination of extreme elements 
in Conv(r) (see Figure 1(b) again): 



and 



1 1 
6' 12 
1 1 
6' 12 



0, 



^Ki" 



1 /I \ 3/11 



These two convex combinations yield two different maximum likelihood es- 
timators, as shown in Figures 2(a) and 2(b). 

It should be noted however that infinitely many maximum likelihood es- 
timators exist in this case: Observe that the hyperplane that passes through 
/ intersects Conv(r) on the line segment joining the points (0, 1/6) and 
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1 2 3 

(a) Example 3.1 : MLE 1 



Point of support not in {X 
X, 


,X2}. 






X2 


1 

6 


12 





1 2 3 

(b) Example 3.1 : MLE 2 

Fig 2. Two maximum likelihood estimators in J-smv{'2), supported on the grid generated 
by the data: X\ = (1,3) and X2 ~ (3,2). The two figures show the contour/level plots of 
the respective maximum likelihood densities. 
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(1/3,0). Then / can be written in infinitely many ways as a convex com- 
bination of points on this hne segment. However, the corresponding MLEs 
will no longer be supported solely on the grid generated by the data. ■ 

3.2. Strong pointwise consistency of the MLE. Let Xi, Jf 2, . . . , X„, . . . 
be the coordinate random elements on the (completed) infinite product space 
(J7°°,^°°,P°°) such that these coordinates are i.i.d. according to /o = /go 
on (0,00)^^. Let A G A°° be the event (with P°°-probability one) that for 
each n G IN there exists a unique SMU density, /„ = /a , maximizing the 
log- likelihood. 

From Theorem 2.3 we have that for each n E IN and a fixed uj ^ A, there 
exists a unique Borel probability measure, Gn on ((0, 00)'^, || • II2), such that 

fn{x) = / --l(o,ii](a;) dG„(M) 

(3.5) = / J-dG„(tx). 

Ju>x l"*^! 

holds true for all x € (0, oo)'^. We are ready to formulate and prove the 
following proposition. 

Proposition 3.5. [Strong Consistency of the MLE in J"sMu] 

(a) (i) The sequence of maximum likelihood mixing distributions {Gn}^=i 
converges weakly to Gq as n —t- oo, P°° -almost surely. 

(ii) In addition, for Lebesgue almost allx G (0, oo)'^, fn{x) -^a.s. /o(^) 
as n ^ oo. In particular, if fo is continuous at x ^ (0,00)'', then 



?n{x) - foix)) 



as n ^ 00. 



(b) The sequence of maximum likelihood estimators, {fn}'^=i, is strongly 
consistent in the total variation (or Li) and in the Hellinger metrics. 
That is, 



(0,oo)d 



fn{x) - fo{x) 



dx — )■„ ,0 as n — )■ 00 . 



and, with h'^{p,q) = (1/2) f{y^p{x) - y^q{x)}'^dx, 
h\^fn,foj-^a.s.O as n-^ 00. 
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Proof, (a) (i) To be able to apply Theorems 3.4, 3.5 and 3.7 of Pfanzagl 
[1988], with the refinement on page 143 of the same article, we need to 
provide the relevant setup as well as establish the assumptions of Pfanzagl's 
theorems. We do this below. 

Let Co ((0,00)'^, II • II2) denote the set of all real-valued, continuous func- 
tions on (0,cxd)'^ that vanish at 00. Let 0,, denote the set of all Borel sub- 
probability measures on (0, 00)'^, equipped with the vague topology, r, which 
makes the space a compact, metrizable, topological space - and thus with 
a countable base. It is also a convex subset of the linear space of all finite, 
signed, Borel measures on ((0,c«)'^, || • II2). For clarity, the vague topology is 
the smallest topology that makes the functions 






'(0,00) 

continuous, for each g ^ Cq ((0,oo)'^, || • ||2). By metrizability, the topology 
r is completely characterized by convergent sequences, 6*^ =► ^ as n — )• 00, 
on {Q^,,t). 

Let also C 0^ be the set of all Borel probability measures on (0,00)*^, 
and notice that fj, £ Q. Also, for each 6^, G Q^ there exists a unique c G [0, 1] 
and a unique 6 £ @, such that 6** = c6. Further, notice that letting m(z/, •) = 
fu{-), for each v £ Q^, and M„(-) = P„log{m(-, X)}, we have 

Mn{e,) = log{c} + Mn{e) < Mn{e), since c G [0, 1], 

whence, sup^ge. {Mn{0)) = sup^ge {Mn{9)). 

With reference measure the Lebesgue measure X = Q and for each 1/ £ Q^,, 
let Pi, S 0* be the sub-probability, Borel measure on ((0, 00)'^, || • II2) with 
Radon-Nikodym derivative with respect to A being f^, Lebesgue almost 
surely. Then by virtue of Fubini-Tonelli, Pi, £ @ when and only when u £ Q. 
Also, notice that for each fixed x £ (0, 00) , the functional i/ i— )• fu{x) is 
not vaguely continuous at any v £ Q^, with a discontinuity point on the 
boundary of [a;, 00). However, since for a fixed x £ (0, 00) , the function 
y I— )• 'i-[x,oo)iy)/\y\ is easily seen to be an upper semi-continuous function 
on (0,00)'^ - vanishing at (X), Doob [1994], Theorem 10, p. 138, applies and 
asserts that the function u 1— ;■ fu{x) on (0^,, r) is itself (vaguely) upper semi- 
continuous. Since this holds for all x £ (0,oo) , it holds almost-surely. Also, 
the mapping v 1— )• fv{x) is afRne on 0* (and hence concave also.) 

It remains to establish that for each fixed r-open subset U of 0*, the 
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real- valued function Tu{-) on (0, oo)'^ defined by 

Tuix) = sup < / — - l^o,u] (x) du{u) \ 

ueu [J{o,ooy m J 

is a ^-measurable function. We can choose to take A to be the Lebesgue 
cr-field, in which case nieasurability follows by observing that T[/(-) is a 
block-decreasing function and appeal to Proposition 2.5. 

We now apply our setup to Theorem 3.4 of Pfanzagl [1988] and further 
appeal to the fact that a vaguely convergent sequence of probability measures 
with limit a probability measure, is, in fact, weakly convergent. This gives 
the desired conclusion: the random sequence of maximum likelihood mixing 
probability measures {Gn}^=i converges weakly to Gq as n — )■ oo, P°°- 
almost surely. 

(ii) Combining the fact that, for each fixed x G (0, oo)'^, u i— )■ fi^{x) is 
vaguely upper semi-continuous on G* with the conclusion of part (a)(i), we 
get 

(3.6) IhK if^ {x)\ < fo{x); P°°-a.s. for all x G (0,oo)'^. 

Let 



and 



Fgo{-) = 


1 - 

J(0,oo)d 


■r"'dGo(u 

\u\ 


PaJ-) = 


J(0,oo)d 


• Atl| <:, 

— — 1 — dG„(u 

\u\ 



be the distribution functions corresponding to the densities /o(-) and fn{-)-, 
respectively, n G M. These distribution functions are everywhere continuous 
on the Euclidean set (0,oo)'^. In fact, since for each fixed x G (0,oo)'^, the 
function u i— )• \x f\u\ / \u\ is bounded (by 1) and continuous on (0,00)^^, we 
then have that 

(3.7) Fq^{x) ^a.s. Fg,{x) for all x G (0,oo)^ 

follows directly by the definition of almost sure weak convergence of the 
mixing random measures {Gn}'^=i to Go, established in part (a)(i). 

Let B be the set of points on (0, 00)'^ at which /o is continuous. Then B"^ 
has Lebesgue measure zero, A(i?^) = 0, exactly because /o is discontinuous 
on the boundary 5[a;o,oo) for a (possibly non-existent) Xq G (0,00)*^ where 
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Pq is discontinuous (i.e. such that Pq{{xq}) > 0.) Since Pq can have at most 
countably many discontinuity points xq £ (0, oo)'^ and since A(9[so7co)) = 
0, we get by countable subadditivity of A that indeed X{B'^) = 0. 

Fix arbitrary x €z B and e > 0. Then, since /o is lower semi-continuous at 
X, there exists an open neighborhood U^^e of x such that for every y £ Ux,e 
we have that fo{y) > foix) — e. In particular, there exists an Ux,e B x^ > x 
satisfying /o(a^e) > foix) — e. Since /o is block-decreasing, we have: 

Further, for each fixed n G IN, since /n(-) is block-decreasing (as a SMU 
density), we have 



(3.9) f^(x) > 



/(.,..] {/g„(^)} dy 



'^" A ((33,33,]) 

Vf (a;, a;,] 
3.10 = —^ r-. 

Equation (3.7) further implies that 

(3.11) Vfq^ {x, Xe] -^ Vpa^ {x, x^] , as n ^ oo. 

Combining equations (3.8)-(3.11) and the fact that e > was arbitrary, we 
get 

(3.12) lim {fajx)} > fo{x); P°°-a.s. for x£B. 

Equations (3.6) and (3.12) yield the assertion: for Lebesgue almost all x £ 
(0,00)^^ (and, in particular, at the points of continuity of /), /g (a;) -^a.s. 
fo{x) as n — ;■ 00 holds. 

(b) Showing consistency in the Li (total-variation) norm is a direct conse- 
quence of part (a) (ii) and Click's Theorem, Click [1974]); see also Devroye 
[1987], p. 25. 

Convergence in the Hellinger metric follows from the following well-known 
inequalities of Le Cam [1986, p. 46]: 

h\P,Q)<^\\P-Q\\L,<hiP,Q){2-h\P,Q)}K 

where h?{P,Q) = 2^^ J ( \/dP — \/dQ j is the squared Hellinger metric 
and II • ||li is the Li-norm. 
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4. A local asymptotic minimax lower bound. Let Xi := (Xj i, . . . , Xi^ciY 

for i = 1, 2, . . . ,n be i.i.d. random vectors from density / G J^SMvid). For 
a fixed Xq = (xo,i, • • • ,a^o,d)' ^ (0, oo)'^, we want to estimate the functional 
^(/) •= /(^o) on the basis of Xi, . . . ,Xn- We shall make the following 
assumption: 

Assumption 4.1. Suppose that f £ J-smu is continuously differentiable 
at xq, 

/{xq) > 0, and, in particular, there exists an open ball A{xq) around Xq such 
that f is everywhere strictly positive on A{xq) and where {d/dxj)f{xQ) < 
exist for all j G {1,2, ... ,d} and are continuous on ^(a^o) ^ (0, oo) . 
Further, we assume that the full mixed derivative of f exists, is continuous 
on A{xo), and satisfies 



^ ' dxi---dxd^ ' 



> for ally G A{xo). 

x=y 



Proposition 4.1. Suppose that f G J^smu satisfies Assumption 4-1 at 
the fixed point xq G (0, cxd)'^. Then there is a sequence {fn\ C J^SMU such 
that any estimator sequence {T„} of f{xo) satisfies 

lim |Ej„ \nl \Tn - fn{xo)\},Ef \ni^ |r„ - /(a;o)||| 
,4.1) >^{3-}'{(-:, ^^^...^^^ 



^^<^' -/wr 



x=xo 



Remark. The lower bound in Proposition 4.1 should be contrasted to a 
similar lower bound for estimation of /(a^o) for / G J-bdd which is derived 
by Pavlides [2009]. In that case the natural hypothesis is df{xQ)/dxi < 
for i = 1, . . . ,d, and the resulting rate of convergence is n^'^ ~^'^' . 

To prove Proposition 4.1 we will make use of the following lemma. It 
was established in the form presented here by Groeneboom and Jongbloed 
[1995]; see also Groeneboom [1996] and Jongbloed [2000]. 

Lemma 4.2. Let J- be a class of densities on a measurable space {X,A) 
and f a fixed element of T . Let J-f denote any open Hellinger ball with 
center / G -F. Assume that there exists a sequence {/n}^i ^ ^ such that 

(4.2) Yim {^h{fn, f)] = a 
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and 



(4.3) 



lim |r(/0-T(/)|=/3 

n— j-oo 



both hold for some constants < a,/3 < oo, and where T is a functional 
on T. Here, h?{fn,f) = 2~^ f{^/fn{x) — y^/(x)}^ d/Lt(x), is the Hellinger 
distance between the ^-densities fn and f. Let l{-) be a convex function, 
symmetric about zero, which is non- decreasing on [0,oo). 
Then, it holds that 



(4.4) 



lim {RnA^f)] > I ( ^/3e-2"' 



where Rn,i{J^) = inf^^ supg£jrEg®n{/(T„ — T{g))} is the minimax risk for 
estimating the functional T(f) based on n i.i.d observations from T . 
In particular, for the loss l{x) = \x\ on we have 



(4.5) 



lim {Rn,\.\{J'f)} > -Je 



-2a^ 



Hereafter, fix an otherwise arbitrary vector h := [hi, . . . , /i^) G (0, 00)*^, 
and define H := diag(/j,) G M^^xd ((0, 00)) . For each A; G M, consider the 
perturbation rectangle 



In{k) :-- 



_i _ j_ 

xoA-n khi,xoA + n ^hi 



i=l 



only for those positive integers n > nQ{k, Xq, h) for which In{k) ^ A[xq) for 
all n > no- The two-dimensional case, d = 2, is illustrated in Figure 3. 
Recall Assumption 4.1. Let b := {d'^ /dxi- ■ ■ dxii)f{x)\ _ and observe 

that {—l^b > 0. Finally, define the functions hn on In{^d) as follows: 



r=i I i"°-' 



1 1 (?/*)- 1 



xo,i-n 33fei,xo,i 



iVi 



and 



9n{y) ■=b {l/„(3rf)(-u) • hn{u)} du. 



where we observe that g-niv) > for all y G /„(3(i), since xq is the center of 
the rectangle /„(3(i). In fact, consideration of the geometry of the definition 
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iC02 +n ^^''112 



a-'^'h2 









„{k) 






^ 














j 


{ 

















(0,0) 



-n-'^'fei 



01 +ra '''hi 



Fig 3. Perturbation rectangle In(k), for the case d = 2, with center xo = (a;oi,2;o2) and 
h = (fti,/i2). 



of (7n(') reveals that, for y G /„, 5n(y) is equal to (— 1)'^6 > times the 
volume of the rectangle [vn{y) A y,Vniy) V y], where Vn{y) is defined as 
that vertex of /„ that is closest in L2-distance from y G In. Since In is a 
decreasing sequence of compact sets, it is then immediately clear that gn{y) 
is (pointwise) non-increasing in n G M, for each fixed y G (0, oo) . 

Assume that / G -?"sMU) and for fixed vectors xq^H G (0,00)*^ we further 
assume that / satisfies Assumption 4.1. For n > nQ{3d,XQ,h), define the 
perturbed density, /„ of / at a^o, by 

r/(^) + 



(4.6) 



fnix) 



[X 



dn 



dr, 



iix £ln{M) 
iix G/^(3(i) 



for some arbitrary but fixed 6 G (0, 1) and where dn is the normalizing 
constant for /„, uniquely determined by f/Q^)d fnix)dx = 1. We will see 
the importance of the value of b and the fact that < < 1 in the following 
proposition that establishes that {/n}n>ni ^ -^SMu(rf) for a sufficiently large 

ni G IN. 

Proposition 4.3. There exists a positive integer ni := ni{d,XQ,h) > 
no(3d, Xq, h) such that fn G FsMU for all n> ni. 

Proof. Since / G ^sMu(t^)) we get from Theorem 2.3 that 
(4.7) Vf[x,y] > 0, for all d-boxes [x,y]. 
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From the definition of Qni'), we see that its full, mixed partial derivative 
exists in a neighborhood of xq. Hence, by definition and the fact that 
{-1)% > and 61 G (0, 1), we have that 



-ir 



9'/n 



dxi ■ ■ ■ dx, 



(-ir 



I 

Qdf 



> 



-ly 



Qdf 



x=y 



dxi • • • dxd 



X 



- (-lfb9 



x=y 



(4.8) 



dxi ■ ■ ■ dxd 
> 2-\l - 9){-l)'^b > , 



i-irb 



x=y 



+ {l-6){-lYb 



where the second to last inequality follows from Assumption 4.1 that the 
full mixed partial derivative of / exists and is continuous at xq from which 
we get, by definition of continuity, that there exists a large enough positive 
integer rii := ni{d,XQ,h) > nQ{3d,XQ,h) such that 



(-ir 



Qdf 



dxi ■ ■ ■ dxd 



(x) 



i-lfb > -2-\l - 9){-iyb 



x=y 



holds true for all y G Ini^d) and n > rii. The result in (4.8) suggests that 



(-l)'^VfJx,y]^{-iy 



{x,y] I dwi ■ ■ ■ dWn 



(w) 



du>0 



holds true for all d-boxes {x,y] with x,y G Ini'^d) and n > ni. 

The last case not considered is the one that exactly one between x and 
y, in the d-box [x,y], is an element of /„(3d). See also Figure 4. For this 
case, we can appeal to Lemma 2.2 by setting [a;o,yo] •= [^^v] ^ -^n(3d) ~ 
the latter being well-defined as the intersection of two rectangles is itself an 
rectangle. Then, from Lemma 2.2 and (4.7), we have, 

m 

(-iyVfJx,y] = {-l)%Jxo,yo] + {-lfY.{Vf„lx,,y.^}> + = 0, 

exactly since [aij, j/j] C /^(3d) for alH G {1, 2, . . . , m} (where m is as defined 
in Lemma 2.2). For completeness, notice that we were not concerned above 
with end-point discontinuities of / (or /„) on the entailed rectangle, subsets 
of /„(3d), as, in fact, / (and /„) is (are) continuous there for n > ni, by 
Assumption 4.1. 

All these observations finally yield that {—l)'^Vf^[x,y] > holds true 
for all d-boxes [x,y] and thus Theorem 2.3 asserts that /„ G ^SMU for all 
n > ni. ■ 
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■ In[K) 
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yi) 




































































{^■2,y 


2) 













(0,0) Ml 

Fig 4. Perturbation rectangle In{k), for the case d = 2, with two rectangles intersecting 
In(k) but otherwise not subsets of it. 



We are ready to prove the main proposition of this section. 
Proof. Recall Proposition 4.3. First, we establish that 



(4.9) 



/ gn{x)dx = {-lfb\{{h'i]. 



_2 

n 3 



where, hereafter, /„ will be the short-hand form for /„(3(i). By definition, 
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notice that, 

gn{x)dx= / Yl{l[^^<^^]}hn{u)dudx 

"J in '' Ju „■ 1 



i=l 



hJu) 



I 1(0,1.] (a3)daj[ du 

J In ) 

„ d 

Jin i=i 






3d X 



XQi — hiTl 3d 



x[l _j_ (ui) - 1 _j_ (ui)] ) duj 

[xQi-hi?! 3d,X0i] (xoi,Xoi+/lin 3d] 



a^'Oi 



JJ"\ / _ 1 [^ii - (a^Oi - hiu 3d)] dui + 

1=1 [-'l^^Oi-'li'l ^ 



XOi + hiTl 3d 



[lij - (j;oi - /ij""- '^"^ )] duj 



a:oi 



Hi / [-y + hin-zd]dy 



i=l 



n 

i=l 



hin 3S 



[tD + /i,n 3dldi(; 



h,n 33 



(-2y) d2/ 



— (-i)'^n{^'-"^}=(-i)'n{^'} 



_2 

n 3 



i=l 



i=l 



thus yielding (4.9). 

We next derive another equahty, the most important fact about it being 
the factor n~^ on the right hand side: 



d d 



(4.10) j^gl{x)dx={^^J h^J{{h!^.]-n-\ 

Before we start deriving (4.10), let us first define four rectangles i?*- with 
j = 1,2,3,4 for each i £ {1,2,... ,d}: 



(i) R\ 



(ii) R 



xoi -hiU 3d , XQi 
XQi - hin^sd,xoi 



xoi -hiU 3d , XQi 



X {xQi,XQi + hin 3d 
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(iii) Rl = [xoi,xoi + hiu 3d 
(iv) R\ = \xoi,xoi + hiu^id 
Then, by definition: 



gi{x)Ax 



XQi - hiU 3d , xoi 



X (XQi, XQi + hin 3d 



b'Ji 



hniu)l,x<u]du} dx 



hniu)hniv)lix<uAv] dvdudx 



J-n ^ J^n ^ J^7i 

d 



/ / \W[i'^i ^ ^i) ~ i^Oi - hiU 3d)\xhn{u)hn{v)\dvdu 

'J In ^ In V 2^]^ ) 

\\\ I [{u^v)-{xoi-hin-^)\dvdu + 



— 2 / \(u f\v) — {xQi — hiU 3d)j dvdu 

Jri 

d 
(4.11)= 2''\[{Su + S2^-S^i} , 



i=l 



where the last equality follows by symmetry and Fubini-Tonelli and the 
integrals in the braces are to be evaluated below: 



XQi XQi 



Sli 



V — [xQi — hiU 3d j I dudv 



, L V 

XQi— hill 'id 



XQi 



XQi-hin 3d 



hin 3d 



Uxoi-v)(v-xoi + hin 3dj| du 



+ hiU 3d J I dy [change of variable] 



y[-y 



Xoi+hiU 3d 
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while, again, by a change of variable argument: 



XQi+hiU 3d Xoi+hifl 3d 



S' 



2i 



/ / 1 ^ ~ ( ^Oi ~ ^j'^ ^'' ) f dudv 



^Oi 



XQi+hiTL 3d 



/ I {xoi -v) + hiU 'id {v - xoi) + hiU 3d I 



dv 



^Oi 



hiU 3d 



y + hill 3d j ( y + /ijTT, 3d 



3d^| dy , 



and similarly: 



J xm—hin 3d "- 



S'si = \ _^\hin -id \v - XQi + hi-n 3d ) [. du 

I XQi—hin 3d 



3^d)} 



r'/l,;n H3 



/i,n 3d 



id / UliU 3d _y| dy . 



Let now qi := /i^n ^'■^'^, for i G {1, 2, . . . , d}, and observe that 



•Sii + S2i — Ssi 



{y{qi -y) + qI -y^ + if - (nv) dy 



i;hfn d , 



so that plugging all these in (4.11) yields the desired (4.10). 

Now, recall from the definition of /„ that S (0, 1) was arbitrary but 
fixed. Also, from j,^ .^ fn{x) da; = 1 we can get an explicit expression for 
the normalizing constant dn'. 

dn = f{x)dx+ / f{x)dx + 9 / gnix)dx 

Jin Ji^ Jin 

gn{x)dx = l + {-lf0bl[{h^}- 

i=l 



_2 

n 3 



where the second to last equality follows from J,q ^w f{x) da; = 1, while the 
last equality follows from (4.9). Notice from (4.12) that d„ ^ 1 as n f oo. 
Also, from the easily verifiable identity gn{xo) = (~l)'^^ni=i {^«} "-"^ ) we 
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have 

ns \fn{xo) - f{xo) 



1 
n3 



dh-nd 



dn 



fi^o] 



111 A ., . {-lrbeutAh^} 



nM ^ - U /(^o) + 



(4.13) 
Also, 



{-l)'^be Yl {hi} (> 0) , as n ^ oo. 



j=i 






(4.14) 



fn{x) - fix) 



n I < — ; ' • — , ' > dx + 5^ fix) dx , 



where, 



nl 1 



/cL 



-1 



^{ Jl + 0[n~z] -1 



/cL 



, as n — ;• oo, 



with the convergence on the last display following from (4.12). Applying this 
to (4.14), we have: 



(4.15) 



2nh\fnJ)=n 



fn{x) - fix) 



V^M+V7R 



da; + o(l) 



as n — )• oo, because < Jj^ fix) dx < 1. 

For fixed n E W, such that / and gn be continuous and strictly positive 
on In, let X(^n) and x^^> denote, respectively, a minimizer and a maximizer 
of / on the compact set In- Let also y(„) and y^"'' denote, respectively, a 
minimizer and a maximizer of Qn on the compact set /„. Observe that, since 
/„ is a decreasing sequence of compact sets converging to {xq}, all of a;(„). 
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^ ) y(n) ^i^d y*-") converge to ajg as n — )• oo. Also, 



sup 
xein 



fn{x) - f{x) 



fix) 



sup 

xei„ 



J_ _ 1 V ^9n{x) 
dn J dnf{x) 



< ( 1 _ ^^ + ^SUp^gj^ {9n{x)} 



(4.16) 



dn) d„infa,e/„{/(a;)} 

— )• , as n — ;■ oo , 



because g„ is pointwise non-increasing in n G IN, Qnixo) = O [n ^'^) and 
/(xo) > 0. 
Also, 



Di(n) = / {/„(a;)-/(cE)}2d^ 



ci^ 



and noticing that 



^ / [e^glix) - O (n-§) /(a:)5n(a;) + O (n"!) /2(^)} da; 



< / {gn{x)f{x)} dx<f (a;(")) /" {5n(^)} da; = O 



_2 

n 3 



so that, 



nDiin) 



n 
^'l V3 



(4.17) 



)a d 

e%'^\\{h^i] , asn^oo. 



on 3 



Now, since / is block-decreasing, we have, 

Q< f \xq + n^^Idhj < f{x) < f (xo- n-^Idh ] 
for all X & In and n>ni. Hence, 

nZ)i(n) ^ [ {fn{x)-f{x)]\ ^ nD^{n) 



iUx) - f{x)f ^ 
< n -— da; < 



f ixo - n '^dl^h) Jiu J\^) 

which, ahead with (4.17) and sandwich, yields 



f[xQ + n 3dl^h 



, {fn{x)-fix)}\ 

n / — da; 

//„ fix) 



:;_ ] ^2^2 nj=l {Kj 



as n — )• oo. 



3/ " " f{xo) 
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Applying all of the above to (4.15), and appealing to Lemma 2 of Jongbloed 
[2000], we get 

(4.18) nh\UJ) = 11^ iil^^^^j=£^dx + oil) 

as n — )■ oo, so that by applying (4.13) and (4.19) to Lemma 4.2, we get 
lim infmaxJE/^ jni |r„ - /„(a;o)|| ,E/ |n^ |r„ - /(a;o)||| 

1 , . ( n3d-2 ^ 



where c = Y\i=i {hi}. For a fixed 9 £ (0, 1) the maximum of Gf^xg{c,9) is 
attained at 

<^) - [ 23^-2^2^2 I 

and is equal to 



G,(c(.),.)^^{3-.}^ (-1)^^ 



d^fi 



X 



■dxd 



x=xo I 



the latter being an increasing function oi 9 G (0, 1). 
This suggests that 



lim infmax|E/„|n3 |r„ - /„(a;o)|| ,E/ jns |r„ - /(a;o)||| 



^ e 3 r^ „^_i1 t I , ..w C-JliE) 



2°^ I J I dxi • • • dxd 



■ /(^o) 

x=xc 



Overall, we are allowed to take ^ f 1 in the above display, even if ^ = 1 
is not a valid configuration, yielding the lower bound in the wording of the 
proposition. The proof is thus complete. ■ 

5. Discussion and open problems. Once consistency has been es- 
tablished, interest focuses on rates of convergence of the MLE and other 
properties, including the behavior of /„ at zero and pointwise limiting dis- 
tributions. We have the following conjectures concerning the MLE /„ for the 
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class J^sMvid). Work is currently underway on all of these further problems. 

Conjecture 1. If /o(0) < oo, then we conjecture that 
i^o(/n(0) < M{\ognf-^) -^ 1 for some M > 0. 

Conjecture 2. If /o(0) < oo and /o is concentrated on [0, Ml] for some 
< M < oo, then /i(/„, /o) = Op{n~^'^ {log n)'^) for some 7 depending only 
on d. 

Concerning rates of convergence of the estimators at a fixed point, we do 
not yet have any upper bound results to accompany the lower bound results 
of Proposition 4.1. Thus there remain the following two possibilities: (a) the 
pointwise rate of convergence under Assumption 4.1 is n^'^, and we expect 
convergence in distribution with the rate n^"; or, (b) the lower bound given 
in Proposition 4.1 is not yet sharp, and we should expect log terms in the 
rate (as might be expected from the covering number results of Blei et al. 
[2007]). Our corresponding conjectures for these two possible scenarios are 
given below as Conjectures 3a and 3b respectively. 

Conjecture 3a. Suppose that /o has d fo{x)/dxi ■ ■ ■ dx^ continuous in a 
neighborhood of Xq with 



a%(.,)- ^^»<^' 



^0. 



dxi ■ ■ ■ dxd 
Let {VF(t) : t G R'^} be a 2'^-sided Brownian sheet process on R"^ and let 

Y(t) = ^fWo)W{t) + i_lll(-l)'^5'^/o(a^o)|*P. 

Then, in keeping with our lower bound results of Section 4, we conjecture 
that 

ni/3(/„(a;o) - /o(a;o)) -^d d''n{t)\t=o 

where the process H is determined by 

(i) H(t) > Y(t) for all t £ R'^, 



(ii) / {M{t)-Y{t))d{d''M{t))=0, and 
{in) VQdjji[u,v) >0 for ah u<veR'^. 
Partial results concerning Conjecture 3a were obtained in Pavlides [2008]. 
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Conjecture 3b. As suggested in part by the covering number results of 
Blei, Gao and Li [2007] , the pointwise rate of convergence is (n/(log n) ")^' ^. 
This would entail an improved version of Proposition 4.1. In this case we do 
not yet have conjectures concerning the limiting distribution. 

Acknowledgments: We owe thanks to Marina Meila, Fritz Scholz, and 
Arseni Seregin for helpful discussions concerning the proof of uniqueness, 
and especially Lemmas 3.3 and 3.4. 
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